Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata

نویسندگان

  • Jorge Castro
  • Ricard Gavaldà
چکیده

We present an improvement of an algorithm due to Clark and Thollard (Journal of Machine Learning Research, 2004) for PAC-learning distributions generated by Probabilistic Deterministic Finite Automata (PDFA). Our algorithm is an attempt to keep the rigorous guarantees of the original one but use sample sizes that are not as astronomical as predicted by the theory. We prove that indeed our algorithm PAClearns in a stronger sense than the Clark-Thollard. We also perform very preliminary experiments: We show that on a few small targets (810 states) it requires only hundreds of examples to identify the target. We also test the algorithm on a web logfile recording about a hundred thousand sessions from an ecommerce site, from which it is able to extract some nontrivial structure in the form of a PDFA with 30-50 states. An additional feature, in fact partly explaining the reduction in sample size, is that our algorithm does not need as input any information about the distinguishability of the target.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAC-learnability of Probabilistic Deterministic Finite State Automata

We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAC-learning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a bound on the distinguishability between states. With this, we demonstrate that the class of PDFAs is...

متن کامل

PAC-learnability of Probabilistic Deterministic Finite State Automata in terms of Variation Distance ⋆

We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the generation of strings of symbols, that have been used in the context of speech and handwriting recognition, and bioinformatics. Recent work on learning PDFAs from random examples has used the KL-divergence as the error me...

متن کامل

Learning Probabilistic Finite Automata

Stochastic deterministic finite automata have been introduced and are used in a variety of settings. We report here a number of results concerning the learnability of these finite state machines. In the setting of identification in the limit with probability one, we prove that stochastic deterministic finite automata cannot be identified from only a polynomial quantity of data. If concerned wit...

متن کامل

Probabilistic Deterministic Infinite Automata

We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define a PDFA with an infinite number of states (probabilistic deterministic infinite automata, or PDIA) and show how to average over its connectivity structure and state-specific emission distributions. Given a finite training sequence, posterior inference in the PDIA can ...

متن کامل

Learning probability distributions generated by finite-state machines

We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008